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Abstract 



A multivariate version of Spearman's rho for testing independence is considered. Its 
asymptotic efficiency is calculated under a general distribution model specified by the depen- 
dence function. The efficiency comparison study that involves other multivariate Spearman- 
type test statistics is made. Conditions for Pitman optimality of the test are established. 
Examples that illustrate the quality of the multivariate Spearman's test are included. 
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1 Introduction 

Testing for independence among the components of m-variate vector is an important statistical 
problem. There is an extensive statistical literature on this topic. Over the last two decades a 
variety of new multivariate measures of association have been suggested, including those based 
on ranks, and their properties have been studied. 

Let Xi — {Xii, . . . ,Xim), m > 2, i = 1, ... ,n, be independent random vectors with abso- 
lutely continuous cdf F and marginal cdfs Fi, . . . ,Fm- Denote by Rij the rank of Xij among 
n, j = 1, . . . , m. In case of bivariate random sample, when m = 2, a com- 
monly used statistic for testing the hypothesis of independence, : F = F1F2, is Spearman's 
correlation coefHcient pT) 
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that estimates the functional 

p{F) = 12 y FdFidF2 - 3. (1) 

Among various multivariate extensions of Spearman's rho available in the statistical literature, 
the following three statistics seem to be quite popular (see, for example, [9], [18] [20], [22]): 



™ I i=l k=l ^ ^ J 



[ ^ ^ l<j<j'<m i=l ) 

where Cm — n^^ ^"=1 *™ ~ ((^* + l)/2)™ is a normalizing factor. 

Statistic ((4|) is simply the average pair- wise Spearman's rho [11, Ch. 6] that estimates [9] 



'^miF) = 12 I Q ' I ^ - 3, 



Statistics ([2|) and ([3]) arc natural generalization of Spearman's rho, as they are sample counterparts 
of the functionals 

Sm{F) = J-l [ FdFi...dF, 



dn 



rn 



Wm{F) = -^<i / Fi...FmdF-o 



m f J 



d„ 

where c,„ = 2"™, d„i = {m + 1)^^ — 2^™, respectively. The correspondence between S',„_„, the 
main object under investigation in this paper, and Sm{F) is easy to see. Indeed, let Fn be the 
multivariate empirical cdf that corresponds to F^ and let Fj^„ be the marginal empirical cdfs 
based on Xij , . . . , Xnj , j = 1, . . . ,m. Then 

n 
k=l 

where = {n/(n + l))Fj_„ are the modified empirical cdfs. Therefore Sm,n can be written in 
the form 



where, taking into account that X^T^i ^ ri"^+^/(m + 1), as n ^ oo, we have 
{n + ir/Cm - (l/(m + 1) - 1/2")-^ = l/d,„. 
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Thanks to the Ghvenko-CanteUi theorem (see, for example, |3] Sec. 1.4, Th. 1]) the cfoseness of 
Sm,n and SmiF) is now immediately seen by noting that 

/ l[{l~F,{xj))dF{xi,...,x^)= F{xi,...,Xm)Y[dF,ix,). (5) 

Equality ([5]) is easy to verify by integrating by parts on the left-hand side and using the properties 
of a multivariate cdf. 

All three measures of multivariate concordance, Vm{F), s„i{F), and Wm{F), increase with 
respect to the multivariate concordance ordering introduced by Joe [9j Sec. 2], [10[ Ch. 2]. 
This ordering is based on the concept of positive orthant dependence [101 Sec. 2.1]. It results 
from a comparison of a multivariate random vector with a random vector of independent random 
variables having the same univariate marginal distributions. More precisely, let F and G be two 
m-variate cdfs with corresponding survival functions F and G, i.e. F{x2, . . . , Xm) = Pf(^i > 
xi, . . . , Xm > Xm) and G{x2, ■ ■ ■ , Xm) = PgC^i > xi, . . . , Ym > Xm)- Tlicn G is said to be more 
concordant than F (written F G) if 

F(x) < G(x) and F(x) < G(x), for all x = (a;i . . . , Xm) e M™. 

That is, if X = {Xi, . . . , Xm) ^ F, Y = {Yi, . . . , Ym) ^ G, and F G, then the components of 
Y are more likely than those of X to take on small and large values simultaneously. As shown in 
[S], -F G implies Vm{F) < Vm{G) and Wm{F) < Wm(G). The fact that Sm(F) is also increasing 
with respect to -<c follows immediately from Lemma 3.3.1 of [9 , the Remark below this lemma, 
and equality 

Unlike the classical problem of testing independence when m = 2, there is still no clear 
clear concept of negative multivariate concordance. Some "characterizations" of the negative 
multivariate concordance can be found, for example, in [S]. 

In connection with testing independence among the components of a m-variate random vector, 
statistics ©-Q were studied by several authors. One of the earliest comprehensive study related 
to multivariate rank statistics for testing independence can be found in [19 . The Pitman efhciency 
properties of the tests based on ©-(U) are investigated in [5], [TB], [12], among others. The 
asymptotic normality of statistics ©-([I]) is established in [20] under rather weak assumptions 
on the underlying distribution. The asymptotic efficiency study of the Spearman-type tests, 
including those based on Sm,n and Vm,n, is conducted in |18j under various distribution models. 
The thorough study of Pitman efhciency properties of Wm.n and Vm.n is done in j22| . 

An interesting problem related to finding the Pitman efficiency of a test is to discover the 
structure of the underlying distribution for which the test is Pitman optimal. Many test statistics 
were suggested by their authors empirically for solving particular problems of testing hypotheses, 
and were supposed to work in one or another particular situation. Problems of finding the most 
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favourable alternatives have been studied in [5], [5], [TH Ch. 6], [T7], [12], etc. In this paper, 
assuming one-parameter model 

m 

Fe(x) ^ Y[Fj{xj) + 9n,niFi{xi), . . . , F„,{xmj), x = (.ti, . . . , a;„) e M™, (6) 
i=i 

where is a parameter of association close to zero and is the dependence function defined on 
the unit m-cube and satisfying certain boundary and smoothness conditions, we find the most 
favourable alternative to independence for which the test based on Sm.n is Pitman optimal. 

In order to determine the "optimal" distribution function one has to solve a variational prob- 
lem of minimization of an appropriate functional on a set of special type, depending on the 
structure of the test statistic. Typically, optimality conditions for tests are found by using the 
Lagrange multiplier rule applied to a functional on a Banach space. Under the validity of model 
([6|), the optimality problems for the Spearman- type test statistics Wm.n and Vm.n have been solved 
in [22j . Compared to these two cases, the extreme problem related to Sm,n is much more com- 
plicated and is reduced to solving the system of partial differential equations with non-standard 
boundary conditions. In Section 4.2 we provide solution to a general m-dimensional extremal 
problem that gives Pitman optimality conditions for the sequence {Sm,n}n>i- 

In Section 2 we introduce statistical model and describe its properties. Some basic properties 
of the test statistic Sm,m including its asymptotic normality in terms of the dependence function, 
are given in Section 3. Asymptotic efficiency study is performed in Section 4. The key result of 
the paper, the Theorem of Section 4.3, provides the most favourable alternative to independence 
for the test statistic at hand. 

2 Multivariate model 
2.1 Definition of the model 

Suppose we observe an m-variate random sample Xi, . . . , X„ of size n from distribution Po on 
the measurable space (R™,S™) indexed by a parameter 6 > 0. Then the full observation is a 
single observation from the product Pg of n copies of Pg. Let Fg be the distribution function 
that corresponds to Pg. When testing independence among the components of a continuously 
distributed random vector, without loss of generality, the marginal cdfs Fj, j = 1, . . . ,m, can 
be taken uniformly distributed on the interval [0, 1]. Then, the statistical model is described in 
terms of distribution functions as the collection of probability measures {P^ : 6* > 0} on the 
sample space (R™xn^gmx«) ^^^^ ^^lat 

m 

Fe{^) = l[x,+0n^{^), x=(ii,...,:E„)e [0,1]™ = /™, m>2, (7) 
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is satisfied for sufficiently small value of 9 subject to some restrictions on Q.^- To be precise, let 
J^m = {Fb} be the class of absolutely continuous cdfs of type (O for which, cf. [551 Sec. 2], 

(Cl)an(x)>0, xe/™, 
(C2) r!™(x)| 

(C3) f7,„(x)|j.^=i = f7„_i(a;i,...,a;fc_i,Xfc+i,...,a;„i), 1 < A: < m, 
(C4) there exists a non-zero mixed derivative 

— ^^^^ , for Am-almost all x e I™, 

such that uj„i € L2 (/"*), where 1^ is the Lebesgue measure on (R'",^™) and . . . , im) is an 
arbitrary permutation of the set {!,..., to}. 

Due to (CI), boundary conditions (C2), and the consistency property (C3), for sufficiently 
small 6, all the properties of a multivariate cdf are satisfied. The regularity condition (C4) 
implies local asymptotic normality of the sequence of models {Pg : 6' > 0} at 6* = (see Section 
2.2 for details). In the sequel, m-variate sample Xi, . . . ,X„ is assumed taken from distribution 
for which the cdf Fe(x) belongs to Tm, m> 2. The symbols Eg and Varg (with index n omitted) 
are used below to denote the expectation and the variance with respect to Pg. 

We are interested in testing the hypothesis of independence 

Hq: e = o 

against the one-sided alternative 

ffi : 6* > 0. 

In the case m = 2, model (O was first studied by Far lie [4J and appeared later in a number 
of publications (see [22] Sec. 1] for references), sometimes with a specific choice of dependence 
function. Considered under assumptions (C1)-(C4), model jT]) is an extension of the Farlie model 
to the multivariate case. 

2.2 Local asymptotic normality of the model 

Recall that a sequence of statistical models is locally asymptotically normal (LAN) if it converges 
to a Gaussian model whose properties are well known fSl Sec. 2], [23l Sec. 7]. 
Let fe be the density of Pg with respect to Am, that is, 

/e(x) - l + 0co,,„(x), xe/™, TO>2, 

and denote by /e(x) its partial derivative with respect to 9. The true statistical difficulty is to 
distinguish between the null hypothesis and the alternative when 9 is small, typically "of size 
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0(n-i/2)." Therefore we introduce a local parameter h = \/nd^ and consider a local statistical 
experiment indexed by h: 

(Xi,...,X„)~{P«/^: /.>0}. 
Our attention will be focused on the performance of the test based on Sra,n at alternatives 

: /i > 

converging, as n ^ oo, to the null hypothesis 

iJo : /i = 0. 

Let A„ g be a random vector such that A„ 9 — ^ Af{0,lg), where for 9 = 9^ = h/^/n, 

Ie^-Ee(^^\og{dPe/dK^)^ = d^, ^ > 0, 

is the Fisher information in the parametric family {fg{x), 9 > 0}. Thanks to Theorem 1.1 of [8], 
under the regularity condition (C4), the sequence of statistical experiments {PJ^^^ : /i > 0} is 
LAN at the point h = 0, that is, for any h> 

log^^ = /iA„,o--/i% + op.(l), n^cx). (8) 

Under local asymptotic normality 

log^^^AA(^-i/i%,/.2/o), n-00, 

and hence the sequences of distributions {P^^^} and {Pq} are mutually contiguous (see [23l 
Sec. 7.5]). This fact allows us to obtain, by means of Le Cam's third lemma [23j Sec. 6.7], limit 
distribution of Sm.n under the sequence of alternatives Hin, once the limit distribution under Hq 
is known. Another useful consequence of the local asymptotic normality is the existence of an 
upper bound on the asymptotic power function of the test. This makes it possible to establish 
the conditions for asymptotic optimality of the test statistic Srn,n (see [221 Ch.l5]). 

3 Basic properties and asymptotic normality 

In this section we list some basic properties of the test statistic Sm.n- First, note that Sm,n is 
symmetric in m variables. It is normalized so that its value is 1 when Rn ^ Ri2 = . . . = Rim, or 
equivalently, Fg = min(i^i, . . . , Fm) (perfect positive dependence), and its expected value under 
Hq is zero. The lower bound of Sm{F) is equal to 

T^im + 1) r 1 1 1 

Sm (max(i^l + . . . + F,„ - m + 1, 0)) = '-^ { rr - — > , 

^ ^ ' ' >' 2" - (m + 1) \ (m + 1)! 2"/' 
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which is —1 for m = 2 and is greater than —1 for m > 3. The lower bound is an increasing 
function of m tending to zero as m gets larger. Hence Sm,m the sample version of SmiF), also 
exceeds —1 and its lower bound tends to zero as m increases. For this reason, it is appropriate to 
use the statistic S„i^n for testing the hypothesis independence Hq : 9 ~ against the one-sided 
alternative Hi : > only. The non-symmetry between the upper and lower bounds is due to 
the "curse if dimensionality" and is partly explained by the inequality [TUl Lemma 3.8] 



max(Fi + . . . + F.^ - m + 1,0) < Fe < min(Fi, . . . , F, 



J ^ rah 



where in contrast to the Frechet upper bound, niin(_F'i, . . . , Fm), the Frechet lower bound, max(i^i-|- 
. . . + F„i — m + 1, 0) is generally not a cdf, except for the case m = 2. Through the curse of di- 
mensionality, the concepts of perfect positive and perfect negative dependence lose the symmetry 
of the two-dimensional case. 

There exist a variety of theorems on asymptotic normality of multivariate linear rank statistics. 
A unifying approach to these various results is given, for example, in 19J. In particular. Theorem 
2 of |19| implies that Sm,n is asymptotically normally distributed. For our purpose, however, it is 
more convenient to establish asymptotic normality of Sm,n through the correspondence between 
Sm,n and a closely related {/-statistic. 

The multivariate rank statistic Sm.n is asymptotically equivalent to a (m + l)-dimcnsional U- 
statistic, Um,n, based on J FgdFi . . . dFm- The kernel of the [/-statistic comes from symmetrizing 
l{X^+i,j < XjjJ = 1, . . . ,m), cf. El eq. (3.4)]: 

^ ^ l<ii<...<i„+i<Tt 

where 

g(Xi, . . . , X-,„+i) = , , < Xij^,i, . . . , Xi m < Xi m) ~ Cm)/dm, 

{m + l)l ^ ^ 

{ll,...,lm + l) 

and the summation is extended over all permutations (zi, . . . , im+i) of {1, . . . ,m + 1}. 

The following result establishes "locally uniform" asymptotic normality of [/,„,„. It will be 
used for calculating the slope (or efficacy) of the test statistic S'm.„ whose limit distribution 
coincides with that of Um.n- 

Lemma 1. If F E !F„i = {Fh/^}, then for all h > 0, 

{h)) d 



Orn{h) 



.^(0,1), n 



where 



2™(m + 1) 



7" 



2,., (m + l)^ /MV" m 



(2™ - (to+ 1))2 ^^3/ 3 



Proof. For 6* > 0, put 

=Ee*^(Xl)-(E,C/„,„)^ *e(x) =Ee5(Xi,...,X„,+i)|Xi =x). 
By the CLT for {/-statistics (see, for example, [13, Sec. 4.2]) for all 6* > 

ni/2((m + l)VW)"'/'(C^m,«-EeC/™,„) ^AA(0,1), n ^ oo, (9) 
provided rim{d) > and Ee'^gpi-i, . . . , X^+i) < oo. Note that 

EeC/m,n = E6l(7(Xi, . . . , X,„+i) 

= Eg {I{Xm+l,l < Xii, . . . ,Xm+l,m 

= (^J Fg{y.)dFi{xi) . . .dFra{x,n) ~ I d,n ^ Sra{Fe). 

In particular, EoC/m.n ~ 0. Next, under Hq 

^o(x) — 7 ——7 {^Q{l{Xm+l,l < Xii, ... ,Xm+l,m < Xm,m)\^l —^) — Cm} /dm 

(to + 11! ^ ^ 
(ii,...,J,„ + i) 



1 to! 



dm I (™ + 1)! 



2™ - (to + 1) 



Under model ([7]), the calculation of 77m(0) = Eo5'q(Xi) can be simplified by noting that in case 
of independence, the vectors 1 — Xi and Xi are equally distributed, each with i.i.d. uniform 
components. Therefore 

/ m rn ' 

Vm{0) = Eo^lil - Xi) ^ ^^^^ _ ^ ^^^^ Eo I 2 - ^^j) + 2" n -^1^- - + 1) 

1 / A 4 \ "' TO 

1 



(2"-(to + 1))2 VV3/ 3 
From this, applying Q we get under Hq 

V^(J-^\Q){Um.n ~ Mm(0)) ^ ^f{0, 1), n ^ C», 

where 

o , , (to + 1)2 //4\" TO , 

Thus, for 9 — the lemma is proved. 
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Now using the "contiguity arguments" we will reduce the derivation of asymptotic normality 
under 0„ = h/^/n to derivation under 9 = Q. First, applying the projection technique to the 
?7-statistic Um,n, we get 

1 " 

VniUrn,n - ^J'7n{0)) = ^ ^ V'o(Xi) + Op„ (1), 

where 



1=1 



^o(x) = (m + l)vI/o(x) = ^„/l"^^^^j jsfjx, + 2-^(1 - X,) - (m + 1) 

Then, under iJi„ : > 0, Le Cam's third lemma implies (see [231 Sec. 7.5]) 

\/^(C/™,„-Mm(0)) ^AA(/iEo[V'o(Xi)/o(Xi)],EoVo(Xi)) , 



where ^6i(x) = {d/dO) log fg{x) = u;„i{x}/{l + 6'(jj„i(x)), 6* > 0. In other words, the statistic Um,n 
is approximately normally distributed with variance n~^Eo'0o(Xi) = n^^CT^(O), where cr^(O) is 
defined in pO|) . and mean value 



= /iEo[V'o(Xi)/o(Xi)] = / Vo(x)w™(x)dx 
(to + 1) 



(to, + 



2^x,+2'"n(l 



Xj) — (to + 1) (jJm(x)(ix. 



Notice that 



a;ju;m(x)(ix = 0, 1 < j < to, 



y WTO(x)dx = 0, J 

]^(1 - a;j)a;,„(x)(ix = / r2„i(x)dx. 



where the first two equalities are consequences of boundary conditions (C2), and the third one 
follows from ([5]). Therefore 

t^my , 2'"-(to+1) Jlrr^ ^ ' 

The proof is completed. □ 

Due to Lemma 1, the test based on Sm.n rejects the null hypothesis of independence at level 
approximately a if y^Sm,.n/o'm{0) > Za, where = $^^(1 — a) is the quantile of order (1 — a) 
of a standard normal distribution. 

4 Asymptotic efficiency 

First, we calculate the Pitman efficiency of the test statistic Sm.n- Denote by jm.n{0), ~ 
h/y/n > 0, the power function of the test of level approximately a: 

(9) = Pg{y/nSm,n/crmiO) > Z^). 
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If for a sequence of tests {T„} the corresponding sequence of power functions satisfies ^n{h/ \/n) — > 
1 — $(zq, — hs), for every h>Q, then the sequence {Tn} is said to have slope (or efficacy) s. A 
widely-recognized quantitative measure of comparison of two statistical tests is the square of the 
quotient of two slopes. This quantity is called the asymptotic relative efficiency (ARE) of the 
tests. Further, if the sequence of experiments {P'g : 6* > 0} is LAN at ^? = 0, then an upper bound 
on the slope exists [23, Th. 15.4]. This yields the relative efficiency of the test with slope s and 
the best test and thus allows us to determine the absolute quality of the former. 

Lemma 1 implies that the sequence {>S'„i_„}„>i is locally uniformly asymptotically normal. 
Then the general result on behavior of the local limiting power function, defined as 

Jmih) = lim Jm.n{h/y/n), h>0, 

n — >oo 

says that 7m depends on the sequence {'S'„i,n}„>i only through the quantity /ij^j(0)/o'm(0), the 
slope of the sequence of tests (see [23l Th. 14.7]). 

4.1 Relative and absolute measures of efficiency 

Next lemma gives an expression for the local limiting power function of the test at hand in terms 
of the dependence function flm- 

Lemma 2. Assume model ^ and let jm{h) — lim 7™ „ {h/^/n) . Then 

n — ^oo ' 

2™ 



7„ (/i) = 1 - $ z„ - 



y-j^ h j f2,„(x)dx 



((4/3)™ - m/3- 1)' 

Proof. In view of Lemma 1, the proof follows immediately from Theorem 14.7 of [53]. □ 



From Lemma 2, the measure of efficiency for the sequence {Sm,n\n>i is equal to 

r!™(x)dx) . (11) 



M'„.(0)V 4™ ^ " 



Or,^{0)J (4/3)™ - m/3 - 1 V7/ 
For the multivariate Spearman- type statistics Wm,n and Vm,n these are (see [22j Sec. 4.1]) 

2 



(t™,w(0); (4/3)™ -m/3- 



IP IIL 1 

J Xju;m(x) dx 



(12) 



and 

2 



I E /. ^.^."».M^x| , (13) 



l<i<j<m " ^" 

respectively. The asymptotic relative efficiency of S„i,n relative to Wm,n and Vm^n is then 
ARE(S,H.) ^ AnEiS.W) J r'-Z'-'Z,] ' ^ 
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At this point, recall that the sequence of models {PJJ^^ : h>0} under consideration is LAN 
at h — 0. Therefore there exists an upper bound on the power function of the test (see [231 Th. 
15.4]). More precisely, for all /i > 0, 



limsup7m,„(/i/\Ai) < 1 - ^{za - ft-Vio). 

n — *oo 

That is, the square root of the Fisher information Iq — Jj,^ ^m(^) ^^'^ largest possible slope: 



O-m(O) 



or equivalently, 

(4/3)--W3-l (L"-(-)-^-)'^L--(-)-^-- 
Therefore the Pitman absolute efficiency of the test based on Sm.n is given by the formula 

esin^) - ((4/3).. !l/3 - 1) {L ""-^^^^") V L "-^^^ ^''^ 
For a given function Qrm the closer the value of es(ilm) to one, the better the test based on Sm,n- 
Similarly, using ^ and ^ 



-1 

m ' 



ey(an) = -^^"^(^2") 1^-/ 2;,a;jw™(x)dx I // cj^(x)dx. (17) 
4.2 Extremal problem 

We are interested in finding the most favourable alternative, determined by the dependence 
function il,„(x), for which the sequence of test statistics {Sm n}n>i h^^s the largest possible slope. 
This problem is reduced to the problem of finding r2„i(x) that delivers equality in inequality (|14p . 
The latter is a particular case of a general m-dimensional extremal problem treated below. 

Let us introduce the space C™ of functions that are m-times continuously diffcrcntiable with 
respect to each variable and obey certain boundary conditions: 

CJ," = {ne C"(r") : r!(x)U^=o - O, j - l, . . . ,m}. 

Define a scalar product on C™ as follows: 

ifli,n2)^J^ LJi(x)a;2(x)dx, 1^1,^2 eCJT, (18) 

9™f2i(x) 

where LJi(x) ~ — ^— , i = 1,2. Denote by H™ the closure of the space CJf under the norm 

0x1 . . . OX„i 

II • II induced by scalar product p^ . For any m > 2, H™ is a Hilbert space whose properties are 
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immediately derived from those for to = 2 estabhshed in [15] . In particular, the embedding of 
H™ into C(/™) is compact. Therefore, a function from H™ equals zero on any "left" side of the 
cube /™ adjacent to the origin. 

Recalling condition (C2) imposed on the dependence function Q„i, consider the problem of 
minimizing the functional Li;^(x)dx on the subspacc of H"* specified by the boundary condi- 
tions on the "right" sides of /™ adjacent to the point 1 = (1, 1, . . . , 1) provided ri(x)(i/i(x) = 1, 
with fj. being a finite measure on /™. In order to describe all possible boundary conditions of this 
extremal problem we need some notation. 

Let M = {1, 2, . . . , m} and let 2*^ be the set of all subsets of M. For any U C M, denote xjj 
the |[/|-dimensional vector xjy — [xi : i U). Then, any possible set of the boundary conditions 
has the form 

f^(x)U=i = o, UeM, 

where C 2^^ is such that for any U CV C UeM imphes V e M. That is, if a set U 
belongs to A4, then all its "oversets" also belong to A4. The reason for this requirement is simple: 
if e H™ takes a zero value on the side {xj/ = 1}, it also takes a zero value on all the subedges 
of /™ of less dimension. 

Remark 1. For any U G 2*^ define an TO-dimensional vector of Boolean variables {yj = I{j G 
U),j — 1, . . . ,to). Then I{U € A4) is a monotone Boolean function 12J. Denote by N{m) the 
total number of such functions. Obviously, the number of the above considered extremal problems 
is also equal to N{m). So far, no explicit formula for N(rn) as a function of m has been found. 
For asymptotic behaviour of N(m) as m — s- oo see (14) . 

Return to the extremal problem of interest: 

II^^IIh'" min, where J n{x.)dfi{x) = 1, (19) 

subject to the conditions 

17 e H™ and r2(x)|x„=i for all [/ G Al. (20) 
For a set U = {ii, . . . Cz M and its complement (in M) V — (ji, . . . ,jk), I + k = m, put 

^u^U'' ~ ■^ii • ■ • ■^h'^ji • ■ • ■^jk ' dx.ijdx.jja — dxi^ . . . dxi^ dxj^ . . . dxj^ , 
and define the functions 

Kuix, ^) = K,, (x, ^) . . . X„ (x, I), ku^ (x, ^) = (x, I) . . . kj, (x, I), X, I e /™, 

where 

Kj{x, I) = mm{xj,£,j), kj{x, ^) = Xj£,j, j = 1, . . . , m. 
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According to the Lagrange principle apphed to a functional on a Banach space (see [I] Sec. 2.2.3]), 
the necessary condition of a minimum in (|19p - (pni) is reduced to the Euler-Lagrange equation 



' dx-f . . . dx'^^ 



(-irA , . ^Mx), (21) 



and the natural boundary conditions 



9XV(9Xye(x) 



= 0, for any V ^ M, V (22) 



where the Lagrange multiplier I is found from the integral restriction in (jl9p . The following result 
holds true. 

Lemma 3. Solution to extremal problem (fT9|) -((20 |) is given by the formula 



where Qm the Green function of boundary-value problem ([20l) - ([22)) equal to 

GmI^,^) - KmI^,^) - aa^t/=(x,|)fct,(x,|), (23) 



with the coefficients ajj defined recurrently by 

^ ay = 1, for all U e M, (24) 

and the constant I is given by 

1=11 gM{^,^)d^l{^)d^l{^). (25) 



v<zu 



Proof. First, note that 

where 5 is the Dirac function. Therefore the fimction Qm in ([23)l satisfies 



, a2'"g^(x,g ) 

5a;^ . . . dx'^ 



with an arbitrary choice of the constants au- The function Qm also satisfies natural boundary 
conditions (|22p. 

Taking into account (|20p . we arrive at recurrent system (|24p . Thus, solution to boundary- value 
problem (I2ni)-(E21) is given by (^5]) . 

It remains to note that the Lagrange multiplier A is found from the integral restriction in (|19p 
and has the form ((25|) . The lemma is proved. □ 



13 



Remark 2. Consider the following three sets of boundary conditions: (i) there are no restrictions 
on e H™ except for those that specify the space H™. (ii) fl e H™ equals zero on any (m — 1)- 
dimensional side of I"\ and (iii) G H'" equals zero at the point 1 = Then A4 ~ ^, 

M. = 2*^, and M = {M}, respectively, and by Lemma 3 the corresponding Green functions are 

m 
m 

m m 

These are covariance functions of the classical Gaussian random fields. They correspond to a 
Brownian sheet, a Brownian pillow, and a "pinned" Brownian sheet, respectively, that emerge 
as limiting processes in nonparametric testing of multivariate independence. For example, in the 
case m = 2, the functions and Qm appeared in connection with finding the approximate 
Bahadur efficiency of independence tests based on the comparison of the multivariate empirical 
cdf Fn with the product of margins IljLi -^i ^"^^ with the product of empirical margins Iljli -^j " 
(see [H Ch. 5] for details). 

4.3 Most favourable alternative to independence 

In order to determine the "optimal" distribution function for the sequence {Sm.n\n>i we have to 
solve a variational problem of minimization of the functional Jj,„ a-'^„(x)dx on a set of functions 
of special type (see inequality (|14|) 1. Optimality conditions for the test statistic Sm^n are given 
by the following theorem. 

Theorem. Let Fg E J>„. Tiien the sequence of test statistics {Sm,n}n>i is Pitman optimal if 
and only if 

m / m m \ 

^Cl[xAl[{2 X,) + ^ X, - (m + 1) , (26) 
x= (a;i,...,x^) G/", C > 0. 



Proof. The test based on Sm,n is the "best" for those dependence functions that deliver 
equality in inequality (|14p . Thus, we minimize the functional J^,,, a;^yj(x)(ix on the space H™ 
subject to 




rim(x)dx=l, ri„i{yi)\x,^=...=xi^_^=i^^, i < k < ■ ■ ■ < irn-i < m, 
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where the second constraint on flm is a consequence of condition (C2). Therefore, with the 
notation of Section 4.2 

M = {M, M \ {!}, M \ {2}, . . . , M \ {m}}, 
and the problem ((201)^(1221) takes the form 



^ ' dx\...dxi " ' 



y ri„i(x)(ix = 1, f2„i(x)|2:^=o 0, j = l,...,m, 

^m{^)\x,-^=...=x,^_-^=l =0, 1 < il < . . . < i,„_i < TO, 



dxi^dx\ . --dxl^ 



= 0, 1 < il < TO, 



= 0, 1 < Zi < l2 < TO, 



dxi^dxi^dxf^ . . . dxj^ 



dxii . . . dxi^_^dxj^^_^dx'j^ 



^ = 0, 1 < il < . . . < im-2 < 



According to Lemma 3 the minimum of cj,^,j(x)(ix is attained for the function 

a„(x) = riy^^g(x,Od^, 



(27) 



where 



c;(x, = n ^ ) - E n ^« ^) + - 1) n 

j=i i=i V / i=i 

with iiTj and kj as before. By homogeneity of inequahty (jl4p the extremal function is defined up 

to a positive constant. Integrating in ([771) yields (I2S1). □ 

Remark 3. For the Spearman-type test statistics Wm,n and Kn,n the most favourable alterna- 
tives are specified by the dependence functions (see [221 Sec. 5]) 

nra.wi^) = C;fja;J[].T, -^x, + (to-1)| , xe/™, OO, 

respectively. The function Q.^.v, that corresponds to the pair- wise average Spearman's statistic 
Vra,n, determines an TO-variatc extension of the Farlie-Gumbel-Morgenstern distribution intro- 
duced in [TUl Sec. 5.1]. 

4.4 Examples 

Now we examine, for several choices of cdf Fg^ the Pitman efficiency of Sm.n compared to the 
other two multivariate Spearman- type test statistics, Wm.n and Vm,n- 
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Example 1. Let Xi, . . . , X„ be independent copies of the equicorrelated random Gaussian vector 
X = (Xi, . . . , X„i) with EXi = 0, VarXi = 1, and Cov{Xi, Xj) ^ 9, 1 < i ^ j < m, so that the 
experiment {Pg : 6* > 0} is normal. The first two terms of Taylor's expansion of the cdf of X 
around 9 are of the form ([7]) with the dependence function [S] , [22] 

l<i<j<7n k=^i,j 

The mixed derivative of Jlm(x) is 

l<2<j<m 



and 

J XiXj ujm{x)dx = l/(47r), J ujfj^{x)dx — m{m — l)/2, 
/ J|a;jW,„(x)dx = / a^(x)dx = m(m- l)/(2"+V). 

Now applying (fT5|) -(fT7 l) we obtain 

771 {771 " — 1 ) 9 

e.(0™) = e^n^) ^ ,,.((,^3)L,(4)_,) , eyin„.) = - ^ 0.9119; 

The asymptotic efficiency of Sm.n and Wm,n decreases in m and equals 0.8207, 0.7349, 0.6548 
for m — 3, 4, 5, respectively, whereas the asymptotic efficiency of Vm.n is a constant close to 1 
and independent of m. Thus, in the normal case, the average test based on Vm,n is asymptotically 
more efficient than the multivariate Spearman's tests based on Sm,n and Wm.n- 

Example 2. Consider the multivariate extension of the Farlie-Gumbel-Morgenstern distribution 
for which the dependence function is 

m 

n™(x) =[]x,^(l-a;,)(l-^j), xe/". 

i=3 i<j 

In this case, the average pair- wise Spearman's test based on Vm.n is Pitman optimal, i.e., 
evi^m) = 1 (see [22l Sec. 5]). The mixed derivative of ilm(x) is 

Wm(x) = 1- — y'a;^ H — ^ — TrY^i^j 



m{m — 1) 

j i<j 



and 



J^^ c.^(x) dx = f^™(x)dx ^ J^^^^ n X, a.„(x) dx = ^ 



Therefore, according to pS]) and (fTO)) 



m(m — 1) 

e5(f]„) = eH.(l^™) = i8((4/3)™-W3)-l)- ^^^^ 
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Again, the test statistics Sm.n and Wm,n are equally efficient in the Pitman sense. Their asymp- 
totic efficiency decreases as m increases, and equals 0.9000, 0.8060, 0.7181 for to = 3, 4, 5, 
respectively. 

In both examples the asymptotic equivalence (in the sense of Pitman) of the tests based on 
Sm,n and Wm.n IS explained by the fact that the corresponding cdfs in model ([7]) are radially 
symmetric, i.e., Fo{:>c) = Fo{\ — x), in which case Sm{Fo) and Wm{Fe) are known to be equal (see 
[201 Sec. 3]). 
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